ood example
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Kansas > Ellis County > Hays (0.04)
- Asia > China > Guangdong Province > Zhuhai (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
A Missing Details 453 A.1 Motivations for working with model latent space
Let us now make this point more rigorous. In our experiments, we use empirical quantiles as thresholds. This is the case for all the kernels that rely on a distance (e.g. the Radial Basis Function Kernel, the Matern Knowing the category that a suspicious example belongs to, can we improve its prediction? B&I class are always the lowest among all classes. Table 4: DAUC is not the only choice in identifying OOD examples.
- Health & Medicine (0.68)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data
A deep neural network (DNN) has achieved great success in many machine learning tasks by virtue of its high expressive power. However, its prediction can be easily biased to undesirable features, which are not essential for solving the target task and are even imperceptible to a human, thereby resulting in poor generalization. Leveraging plenty of undesirable features in out-of-distribution (OOD) examples has emerged as a potential solution for de-biasing such features, and a recent study shows that softmax-level calibration of OOD examples can successfully remove the contribution of undesirable features to the last fully-connected layer of a classifier. However, its applicability is confined to the classification task, and its impact on a DNN feature extractor is not properly investigated. In this paper, we propose Taufe, a novel regularizer that deactivates many undesirable features using OOD examples in the feature extraction layer and thus removes the dependency on the task-specific softmax layer. To show the task-agnostic nature of Taufe, we rigorously validate its performance on three tasks, classification, regression, and a mix of them, on CIFAR-10, CIFAR-100, ImageNet, CUB200, and CAR datasets. The results demonstrate that Taufe consistently outperforms the state-of-the-art method as well as the baselines without regularization.
- Research Report > Promising Solution (0.59)
- Research Report > New Finding (0.59)
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, and from the same distribution as the in-context examples. To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. Our experiments on four LLMs of various sizes and training objectives show that they are able to generalize to compositional proofs. However, they have difficulty generalizing to longer proofs, and they require explicit demonstrations to produce hypothetical subproofs, specifically in proof by cases and proof by contradiction.
A Missing Details 453 A.1 Motivations for working with model latent space
Let us now make this point more rigorous. In our experiments, we use empirical quantiles as thresholds. This is the case for all the kernels that rely on a distance (e.g. the Radial Basis Function Kernel, the Matern Knowing the category that a suspicious example belongs to, can we improve its prediction? B&I class are always the lowest among all classes. Table 4: DAUC is not the only choice in identifying OOD examples.